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Abstract 

We consider discrete time models for asset prices with a stationary 
volatility process. We aim at estimating the multivariate density of 
this process at a set of consecutive time instants. 

A Fourier type deconvolution kernel density estimator based on the 
logarithm of the squared process is proposed to estimate the volatility 
density. Expansions of the bias and bounds on the variance are de- 
rived. 
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1 Introduction 



Suppose that we have price data So, Si, . . . of a certain asset in a financial 
market. Let X be the log-return process, defined by X t = log S t — log S t -\- 
It is commonly believed that stochastic volatility models of the form 

X t = a t Z t (1.1) 

describe much of the observed behaviour of this type of data. Here Z is typi- 
cally an i.i.d. noise sequence (often Gaussian) and at each time t the random 
variables at and Zt are independent. We will assume that the process a is 
strictly stationary and that the (multivariate) marginal distributions of a 
have a density with respect to the Lebesgue measure on (0,oo). Our aim 
is to construct a nonparametric estimator for the multivariate density of 
(ot, • • • , &t+ p ), and to study its asymptotic behaviour. 

Models that are used in the literature to describe the volatility display 
rather different invariant distributions. This observation lies at the basis 
of our point of view, which we pursue in this paper, that nonparametric 
estimation procedures are by all means sensible tools to get some insight 
in the behaviour of the volatility. Quite often in models that are used in 
practice, the invariant distributions of a are unimodal. Since it is known that 
volatility clustering is an often occurring phenomenon, it is hard to believe 
that this can be explained by any of these models. Instead, one would expect 
in such a case for instance the distribution of {at,at+\) to have a density 
that has concentration regions around the diagonal with possibly peaks at 
certain clusters of low and high volatility, a phenomenon that may lead to for 
instance bimodal one-dimensional marginal distributions. Nonparametric 
density estimation could perhaps reveal such a shape of the invariant density 
of the volatility. 

We will distinguish two classes of models in this paper. In both of them 
we will assume that the noise sequence is standard Gaussian and that a is 
a strictly stationary, positive process satisfying a certain mixing condition. 
The way in which the bivariate process (a, Z), in particular its dependence 
structure, is further modelled differs however. In the first class of models 
that we consider, we assume that the process a is predictable with respect 
to the filtration Tt generated by the process Z. Note that at is independent 
of Zt for each fixed time t. We furthermore have that (assuming that the 
unconditional variances are finite) a\ is equal to the conditional variance 
of X t given Tt-i- This class of models has become quite popular in the 
econometrics literature. Financial data such as log-returns of stock prices or 
exchange rates are believed to share a number of stylized features, including 
for instance heavy-tailedness and long-range dependence. Models of the type 



(1.1) have been proposed to capture those features. A well-known family 
included in the class (|1.1|) is the family of GARCH-models, introduced by 
Bollerslev (1986). For the GARCH(p, g)-model the sequence {at} in ([O]) is 
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assumed to satisfy the equation 

v q 

i=l j=l 

where the on and j3j are nonnegative constants. Under suitable assumptions, 
see Bougerol and Picard (1992), GARCH processes are stationary and the 
statistical problem in this case would be to estimate the coeficients and 
Pj m (PD- 

In the second class of models that we consider, we assume that the 
whole process a is independent of the noise process Z. In this case, the 
natural underlying filtration T = {Tt}t>o is generated by the two processes 
Z and a in the following way. For each t the cr-algebra Tt is generated by 
Z a , s < t and o~ s , s < t + 1. This choice of the filtration enforces a to 
be predictable. As in the first model the process X becomes a martingale 
difference sequence and we have again (assuming that the unconditional 
variances are finite) that of is the conditional variance of Xt given Tt-\- An 
example of such a model is given in De Vries (1991), where a is generated 
as an AR(1) process with a-stable noise (a £ (0, 1)). 

As we said before, we do not want to make a parametric assumption 
such as ( |1.2| ) , but we still want to measure the volatility of the data somehow. 
In the present paper we propose a nonparametric statistical procedure for 
this problem. Using ideas from deconvolution theory, we will propose a 
procedure for the estimation of the marginal density at a fixed point. To 
assess the quality of our procedure, we will derive expansions of the bias and 
bounds on the variance. This will be done separately for the two kinds of 
model classes outlined above. 

2 Primer on kernel type deconvolution 

We briefly review the construction of the deconvolution kernel density es- 
timator based on i.i.d. observations, see also Wand and Jones (1995). For 
simplicity we consider in this section the univariate case only. Recall that 
the characteristic function or Fourier transform of a density function g is 
defined by 

/oo 
e itx g(x)dx, (2.1) 
-oo 

where A is a random variable with density function g. In the standard 
deconvolution setting the random variable X is equal to the sum of two 
independent random variables, say Y, with unknown density /, and Z, with 
known density k. So g is the convolution of / and k and 

(f> g (t) =Ee itx = Ee^ y+Z ) = Ee itY Ee itz = /(t)fo(t) . (2.2) 
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The objective is to estimate / from i.i.d. observations of X\, . . . ,X n 
having density g. In identity ( gj ) we know 4>k(t) and we can estimate 4> g (t) 
by the characteristic function of a kernel estimator g n ^ of g. So 



^^E^t^)- ( 2 - 3 ) 



n ^— ' h \ h 

where w is an integrable function with integral one, called the kernel func- 
tion, and h > is a positive number, called the bandwidth, governing the 
curvature of the estimate. The kernel estimator itself is also a convolution of 
the empirical distribution function G n of the observations and the rescaled 
kernel function Wh{x) = w(x/h)/h. So, with (f> w the Fourier transform of w, 

/oo 
e itx dG n (x) = Mht)(pem P (t), (2.4) 
-oo 

where 

/OO 1 71 

e itx dG n (x) = -Y j e itx > (2.5) 

is called the empirical characteristic function. From $L% ) we see that 

(ht)4> emp (t) 



(2.6) 



is an obvious candidate to estimate 4>f. Applying an inverse Fourier trans- 
form we obtain an estimator of /. Define the estimator f n h of / as 

/»(.) = - f .-~*^*>*=*W *. (2.7) 



The inversion is allowed if the function ( |2.6| ) is integrable. In general 
this is not guaranteed. However, to enforce integr ability, we assume that <f) w 
has a bounded support. Note that ( |2.7D can be rewritten as 



1 n 1 f 



_L_ J_ f 00 ^(s) ^(x.Jf^/ft , 



n 



, n ,. 

( x ~ 

nh ^— ' 



i=i 



where 
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It is easy to see that the function v^, and hence the estimator f n h(x), is real 
valued. Indeed, taking complex conjugates, we get 



1 



oo 



v h {x) = — \ e lsx (f) w (s)/(l) k (s/h)ds 



1 
1 

Vh(x 



-oo 
oo 



e lsx (/) w (-s)/(f> k (-s/h)ds 
e- isx (j) w {s)/(i) k {s/h) ds 



oo 



A popular performance measure for deconvolution kernel estimators 
is the mean squared error (MSE). The MSE of f n h(x) is defined as 
E (fnh(x) — f(x)) 2 . To obtain asymptotic expansions for the MSE, we need 
expansions for the bias and variance of the estimator. The expectation of 
f n h(x) is equal to the expectation of an ordinary kernel density estimator of 
/ based on observations from /. We have 

/[ / % — u\ 
-w^—^—)f(u)du 

= f(x) + \h 2 [ u 2 w(u)duf"(x) + o(h 2 ), 



as n — > oo, h — > and nh — * oo, provided that w is symmetric and / 
satisfies some smoothness conditions, essentially twice differentiability at x. 
The asymptotic variance of f n h(x) depends on the tails of the characteris- 
tic function of the density k. The smoother k, the faster the tails of the 
characteristic function vanish and the larger the asymptotic variance, see 
for instance Fan (1991). 

3 Construction of the estimators 

We consider the model (|1.1|), so Xt = crt^t- If we square this equation and 
take logarithms we get 

logX, 2 =loga 2 + logZ 2 . (3.1) 

Recall that under our assumptions for each t the random variables at and 
Zt are independent. The density of logZ 2 , denoted by k, is given by 

k(x) = -^=eh x e~\ e \ (3.2) 



Its graph is given in Figure 1 below. 

As in Section ^, it seems reasonable to use a deconvolution kernel den- 
sity estimator to estimate the unknown density / of log a 2 . An estimate of 
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Figure 1: The density function k of \ogZf. 



the density of of or at can then be obtained by a simple transformation. 
Computing the characteristic function (j}f. of log Z 2 we get, with k(x) as in 

/oo 
e itx k{x)dx = ^2 lt T{\ + it), (3.3) 
-oo 

where the gamma function V is defined for all complex z with positive real 
part by 

T{z) = / t^e^dt, 
Jo 

The graphs of Re(0fc), Im(c/>fc) and |(/>fc| are given in Figures 2 and 3. 




Figure 2: The real and imaginary part of the characteristic function (j>k. 
For the model ( |1.1| ) this leads to the estimator 
f n 1 f ^ x-logX 2 ^ 



Figure 3: The modulus of 4>t- 



of the density / of log of , with Vh(x) as in (|2.9| ). Note that, like in the 
previous section, this estimator is real valued. 

The expression for the estimator of the density of the p-dimensional 
random vector (log of,... ,logo^_ p+1 ) is similar. We first introduce some 
auxiliary notation. Let p be fixed and write Xj for a vector (xj, . . . , Xj_ p+1 ). 
We use similar boldface expressions for other (random) vectors. The kernel 
w that we will use in the multivariate case is just a product kernel, w(x) = 
nj=i w(xj). Likewise k(x) = nf=i M x j)- Then with defined by 

(2tt)p J RP 0k(s//i) 

where s E MP and • denotes inner product, the multivariate density estimator 
is given by 

1 » /x-logX? N 

fa|,| = (.-, t i)^ "(-V)' (3(i) 

where we use logX^ to denote the vector (log Xj, . . . , logX|_ p+1 ). 



4 Asymptotics 

The bias of the deconvolution estimator described in Section [2] will be seen to 
be the same as the bias of a kernel density estimator based on independent 
observations from /. Hence, under standard smoothness assumptions, it is 
of order h 2 as h — ► 0. The variance of this type of deconvolution estimator 
heavily depends on the rate of decay to zero of |0fc(i)| as \t\ — > 00. The 
faster the decay the larger the asymptotic variance. In other words, the 
smoother k the harder the estimation problem. This follows for instance for 
i.i.d. observations from results in Fan (1991) and for stationary observations 
from the work of Masry (1991, 1993a, b). 



The rate of decay of |<fo;(i)| for the density (3.2) is given by Lemma 
in Section ||, where we show that 



5.1 



as \t 



oo. 



(4.1) 



By the similarity of the tail of this characteristic function to the tail of a 
Cauchy characteristic function we can expect the same order of the mean 
squared error as in Cauchy deconvolution problems, where it decreases log- 
arithmically in n, cf. Fan (1991) for results on i.i.d. observations. Note that 
this rate, however slow, is faster than the one for normal deconvolution. 

In the model ( |3.l|) the sequence {logXf} is not independent, so re- 
sults on the asymptotic behavior of the kernel estimator of Section ^ are 
not directly applicable. In the literature also more general deconvolution 
problems have been studied, where the i.i.d. assumption has been relaxed. 
For instance, the deconvolution model Xj = Yj + Zj, where {Yj,Zj} is a 
stationary sequence and the sequences {Zj} {Y} are independent has been 
treated by E. Masry (1991, 1993a,b). 

Expansions for the variance of the deconvolution kernel estimator have 
been derived under several mixing conditions. Under the assumption that 
the volatility process is independent of the noise sequence, the model (|3.1|) 
fits into this scheme. We will obtain similar results for the estimator when 
a (as a process) is not independent of Z , but only predictable with respect 
to the filtration generated by Z . 

Let us define the mixing conditions. For a certain process {Xj} let 
J- a be the <r-algebra of events generated by the random variables Xj, j = 
a, . . . ,b. Let the mixing coefficient a k be defined by 



We call a process {Xj} strongly mixing if a k — > as k — > oo. 

To obtain expansions for the bias and variance we also need conditions 
on the kernel function w such as bounded support of its characteristic func- 
tion 4> w (t). Moreover, the rate of decay to zero of 4> w (t) at the boundary of 
its support turns up in the asymptotics. The complete list of assumptions 
on w that we use is the following. 

Condition W. Let to be a real symmetric function satisfying 



Oik = 



sup 



\P{AB) 



P{A)P{B)\. 



(4.2) 



1 



f™oo \w(u)\du < oo, 



2 



JZc w{u)du = 1, 
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limi 



|m|^oo 



w(u) = 0, 
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5. 4> w , the characteristic function of w has support [-1,1], 

6. <j) w {l -t) = At a + o(t a ), as t I for some a > 0. 

Note that by Fourier inversion these conditions imply that w is bounded 
and Lipschitz. More precisely, we have 

|to(a:)| < — an d \w(x + u) — w(x)\ < — \u\. (4-3) 
An example of such a kernel, from Wand (1998), with a = 3 and A = 8, is 

, , 48x(x 2 - 15) cosx - 144(2x 2 - 5) sinx 
w(x) = = . (4.4) 

TTX 

It has characteristic function 

<j> w (t) = (l-t 2 ) 3 , \t\<l. (4.5) 

The next theorem, whose proof can be found in Section [|, establishes the 
expansion of the bias and an order bound on the variance of our estimator 
under a strong mixing condition. Under broad conditions this mixing condi- 
tion is satisfied if the process a is a Markov chain, since then convergence of 
ak to zero takes place at an exponential rate, see Theorems 4.2 and Theorem 
4.3 of Bradley (1985) for precise statements. Similar behaviour occurs for 
ARMA processes with absolutely continuous distributions of the noise terms 
(Bradley (1985), Example 6.1). 

Theorem 4.1. Assume that the process X is strongly mixing with coeffi- 
cient ak satisfying 

oo 

< oo, 

i=i 

for some (3 £ (0, 1). Let the kernel function w satisfy Condition W and let 
the density f of the p-vector (log af,... , log cr 2 ,) be bounded and twice contin- 
uously differ entiable with bounded second order partial derivatives. Assume 
that a is a predictable process with respect to the filtration generated by the 
process Z. Then we have for the estimator of the multivariate density defined 
as in (S.t) and h — * 

E/„ h (x) =/(x) + i/i 2 / u T V 2 /(x)uw(u)du + (/ l 2 ) (4.6) 

and 

Var/ n/l (x) = 0{±{h 2a -Pe*l h y). (4.7) 
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Theorem 4.2. Assume that the process a is strongly mixing with coefficient 
at satisfying 

oo 

J'=l 

/or some /? G (0, 1). Let the kernel function w satisfy Condition W and let 
the density f of the p-vector (log erf, . . . , log a 2 ) be bounded and twice contin- 
uously differentiable with bounded second order partial derivatives. Assume 
furthermore that a and Z are independent processes. Then the multivariate 



density estimator f n h satisfies the same bias expansion as in Theorem 4.1. 
For the variance we have the sharper bound 

Y^f nh ( X ) = 0{±(h 2a e^ h y). (4.8) 



Remark 4.3. Because of the exponential factor in the variance bound, in 
order to obtain consistency, one has to take essentially h > 7r/logn, see 
also Stefanski (1990) for a related problem. On the other hand we would 
like to minimize the bias, so the choice h = 7r/logn is optimal. Both bias 
and variance decay at a logarithmic rate for this choice of bandwidth. This 
seems disappointing, however Fan (1991) shows for the i.i.d. situation of 
Section 2 that we can not expect anything better. 



Remark 4.4. Notice that the results in Masry (1993a, b) establishing strong 
consistency, rates of convergence and asymptotic normality are not useful 
here, because the condition that 4>k has either purely real or purely imaginary 
tails is not satisfied. 



Remark 4.5. Note that our assumptions in Theorem AA are slightly differ- 
ent from those of Masry (1991). One of the essential facts that are used in 
the proof is the mixing property of X. If a and Z are independent processes 
this is implied by a similar assumption on the a process itself as in Masry 
(1991). 



Remark 4.6. In the case where the processes a and Z are independent, 
the estimators f n h{x) have the following property. 

1 A , X - log erf . 
fnh(x) :=E[f nh (x)\^} = -^»( j^), (4.9) 

j=l 

where T a denotes the <r-algebra generated by the whole process a. Thus 
the fnh( x ) would be ordinary kernel density estimators, if the o~| could be 
observed. 
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Equation ( |4.9| ) is seen to be true as follows. Write Uj = logXj and use 
similar notation for Q = logZ? and Tj = logcrj. Then 



(f>k(s/h) 



e 



-is(x-Tj)/h ^ g 



= hi ^( s ) e ~ is(x ~ Tj)/h ds 

= W ( — y 

The result now follows. Of course, the analogous statement for the multi- 
variate density estimator is equally true. One has 

1 ^ ti/ x- (logfj|,... ,log^ 2 _ p+ i) 

j=p 

(4.10) 



/nfc(x) :=E[f nh (y:)\F } = —^ w ( 1 ), 



Remark 4.7. Better bounds on the asymptotic variance than in Theo- 



rem 4.1 can be obtained under stronger mixing conditions. Consider for 
instance uniform mixing. In this case the mixing coefficient <f>± is defined for 
t > as 

sup \P(A\B) - P(A)\. (4.11) 

Similar to strong mixing, a process is called uniform mixing if <f>i — ► for 
t — > oo. Obviously, uniform mixing implies strong mixing. As a matter of 
fact, one has the relation 

at < \<j>t- 

See Doukhan (1994) for this inequality and many other mixing properties. 
If {at} is uniform mixing with coefficient <f> satisfying YITLi < PU) 1 ^ 2 < °°! 



then the variance bound (4.7) can be replaced with 



Var / nfc (x) = o(i (^e^f). (4.12) 



The proof of the latter bound runs similarly to the strong-mixing bound as 
given in section |5[ The essential difference is that in equation fl5,5| ) we use 
Theorem 17.2.3 of Ibragimov and Linnik (1971) with r = instead of Deo's 
(1973) lemma, as in the proof of Theorem 2 in Masry (1983). The result is 



that we can now bound the term M n h of equation (5.5) by a constant times 



Sj=i +1 i f 1 / 2 ^ , ^o- After this step the proof is essentially unchanged. Use 
the estimate EWq < Ch p \\v\\2 to finish the proof. Notice that this bound 



on the variance is of the same order as the one we obtained in Theorem 42 , 
where a was only assumed to be strongly mixing. This bound cannot be 
improved upon by strengthening the assumption to uniform mixing. 
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Remark 4.8. An example of an observed process that is stongly mixing 
and that belong to the first model class is a GARCH(p, q) process. It has 
been shown in Carasso and Chen (2002) (see also Boussama (1998)) that 
such a process is /3-mixing with exponentially decaying /3-mixing coefficients. 
Hence this process is also a-mixing, since the /^-mixing coefficient (3 k = 
E ess sup{|P(A|J^°) - P(A)\ : A € J 70 ^} satisfies the inequality 2a k < k 
(see Doukhan (1994)). Notice that we also have that the assumption of 
Theorem ^j] on the a's is satisfied in this case. 

5 Proofs 

All the estimators that we proposed involve the functions (j>k and (f) w . For 
these functions and related ones we need expansions and order estimates. 
These are collected in the lemmas of this subsection. 

Lemma 5.1. For \t\ — > oo we have 

\Mt)\ = v / 2e-^l(l + 0(^)), 

Re<p k (t) = |0 fc (t)|[cos(t log(Vl + 4i 2 - t)) + O(^)], 

W*(*) = \<f> k (t)\[sm(tlog(Vl + 4t 2 - t)) + O(^)]. 

Proof. By the Stirling formula for the complex gamma function, cf. 
Abramowitz and Stegun (1964) Chapter 6, we have 

log r(z) = {z-\)\ogz-z+\ log 2tt + O(^), (5.1) 

as \z\ — > oo and |Arg z\ < it for some 5 > 0. So for z = \ + it and \t\ — > oo 
we get 

log T(i + it) = it log(| + it) - (| + it) + i log 2vr + O(^) 

= zi(log|±+ii| +iArg(i +it)) - (± +it) + \ log27r + O(^) 
= -*Arg(i +it)) - | + ilog2vr + i(tlog|i + it\ -t)+0(j l ). 

Taking the modulus of the exponent the imaginary part vanishes and we get 

|T(| + it)\= exp(-tArg (± + it)) - \ + \ log 2tt + O(^)) 
= v / 27rexp(-t arctan2t - \ + 0(4)) 
= v / 2^exp(-i7r|t| +0(^)) 
= v / 2^exp(-^|t|)(l + 0(^)). 

Here we have used the expansion t arctani = i(^7r — arctan(l/i)) = \nt — 1 + 
0(l/t), as i tends to infinity. For negative i a similar expansion holds. Since 
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2 lt = exp(itlog2) has modulus one, substituting this expansion in (3.3) now 
proves the first statement of the lemma. The argument of + it) satisfies 

Arg(I\i + it)) = tlog|§+it| -t + 0(fa) 

= — t log 2 + t log 1 1 + 2*t| -i + 0(^). 

So, since Arg (2 ) = ilog2, we have 

Arg(0 fc (t)) = tlog( V / l + 4t 2 -t) + 0(i), 
which proves the second and third statement of the lemma. □ 
Consider now the function Vh defined in (|2.9|). 



Lemma 5.2. We have the following order estimate for the L 2 norm of Vh- 
Forh^O 



( h ±+a e n/2hy 



Proof. By Parseval's identity 



Write 





\\v h f = 




<i>w(s) 




<fik(s/h) 






< 









i r 1 1 Ms) 



2vr J_ 1 l 4> k (s/h) 



ds 



dx. 



\Ms/h)\ 



2 2 1 



l^\s/h\ 



ds 



The integral in (|5.3[) can be rewritten as 



\<t> w {s)\ 2 e^ds 

- e^ h I' \<t> w {s)\ 2 e^ s l h \-W h »ds 



-i 



2e^ h / \<j> w (s)\ 2 e*W h \-W h »ds 



l/h 

2e^ h h I |^(1 - hv)\ 2 e ^-h»)/h-(l/h)) dv 

2 



2e n/h h 



l+2a 



rl/h 


4> w (l - hv) 


Jo 


(hv) a 



v 2a e-™dv 



2e 7v/h h 1+2a A 2 / v 2a e~™dv 
Jo 

2e*/ h h 1+2a (ir)- 1 - 2a A 2 r(2a + 1), 



(5.2) 



(5.3) 
(5.4) 



13 



by the dominated convergence theorem. Omitting constants, we can rewrite 
the integral (0) as 



1 n — Tr\s/h\ 

\Ms)\ 2 e^ - 1 

l 

2 e ~ir\s/h\ 



Hk{s/h)\i 

\Ms)\ 2 



ds 



i 



2e 



7t/h 



\h(s/hW 

2 e -7r|s//i| 



\4>k{s/h)\< 



e *{\s/h\-{l/h)) ds 
e n(\s/h\-(l/h)) ds 





\cp w (l-hv)\ 


2 


2 e — ir(l/h-v) 


lo 


(hv) a 




\</> k (l/h-v)\* 



v 2a e-™dv 



2h 1+2a e 7r/h 
2h 1+2a e n/h o(l), 



by the dominated convergence theorem. We have used the fact that both 
the functions (fi w (l — u)/u a and (see Lemma 5T) |(2exp(— -ku) /\4>k{u)\ 2 ) — 1| 
are bounded and that the second function is of order 0(l/u) as u tends 
to infinity. This shows that the term (|5.4| ) is negligible with respect to 
(O). □ 



Corollary 5.3. The L -norm of the function v/j ; defined in (31) is of order 
0(h p ^ +a) e p ' K l 2h ) . 

Proof. This follows from the product form of v/, given by v/ l (s) = 

nLi □ 



Proof of Theorem 4.1. The expansion (|4.6|) follows from Theorem 1 in 



Masry (1991). To prove the variance bound (|4.7|) we argue as in the proof 
of Theorem 2 in the same paper. First we give a bound on the variance in 
terms of the L2-norm of the function v/j and then we exploit the asymptotic 



expansion of the characteristic function (j>k as given in Lemma 5.1 to get a 
sharper bound on the Z/2-norm of v h than Masry in his Proposition 3 by 
taking the behaviour of (j) w at the boundary of its support into account. 
Some details follow. 

Argueing as in Masry (1991) we can show that 

Var/ nfc (x) = 0(Mt + Af nfc ) ) 
with (up to a multiplicative constant) 

1 n 



nh 2 P 

3=V 



where Wj =Vh 



■ x— log X 
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Applying a lemma by Deo (1976), we can bound for strong mixing 
process X with mixing coefficients ctj the term M n h by a constant (not 
depending on n and h) times 

which, by stationarity, becomes 



nh 2 P 

Observe now that, by boundedness of the density of log A?, the term 

E|Wo| 2 ^ 1- ^ can be bounded by a constant times h p \ \v h \ l^/n-p) anc ^ that 
we can therefore write 

V f f \ r,(\\^h\\l . \^ /3 1 1 12/(1-/3) x 



The proof will be finished by application of Corollary |5.3| , which gives the L 2 - 
norm of v/j, and an estimate of the I? I ( 1_ ^)-norm of v^. For the latter one we 
have the inequalities ||v/i|| 2 /(i_/3) < 1 1 1 1 1 v/,,| l^ - ^ and Hv^l^ < C||v/i|| 2 
for some constant C by the fact that 4> w has compact support. As a result 
we get 1 1 Vft | ^/(i-^) < C||vft||2 and that M n h is less than a constant times 
\\ v h\\2/'nh p ( 1+ l 3 \ The bound on Var/ n ^(x) of theorem 4.1 now follows. □ 



Proof of Theorem [4.2| . Let T a be the cr-algebra generated by the process 
a. We use the decomposition 

Var / nft (x) = E Var (/^(x)^) + Var / n7l (x), (5.6) 



with fnh( x ) as in Remark |4.6j . We now consider the first term in (|5.6|). Let 
Zj = (log ,logZj_ p+1 ) and q., = (logcr],... , log a]_ p+1 ). Since the 

Zi are independent given T a we can bound the conditional variance by 



j=p 

which is by conditional independence and stationarity equal to 
1 f i ^ x-q -z , 2 , , C ., ||2 

with C the maximum of k, the density of zq. Therefore the first term in 
is of order ||v h |||/n/iP, so of order 0(h p( - 1+2 ^eP n / h fnhP) . 
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The second term of ( |5.6j ) is treated next. We have with Uj = w( x fe qj 



■y 2 » 

Var/ nfc (x) = -^-^VarC/ i + -^^Cov([/ i ,C7 J - 



The first term reduces by stationarity to ^^-Var U\ which can be bounded 
by a constant times ||w|||/nW, since (log erf, . . . , log a^) has by assumption 
a bounded density. For the second term we proceed as in the proof of 
Theorem iA. Using stationarity we write it as 

n 

^2(n-k)Cov(U k ,U ). 



J2_ " 

n 2 h 2p 

k=l 



We split the summation into two parts. In the first part we consider 

p-i 

J2(n-k)Cor(U k ,U ). 

fc=i 

whose absolute value can be bounded in view of the Cauchy-Schwarz inequal- 
ity and stationarity by (p — l)nEC/Q, which is bounded by (p — l)nh p \ |w| ||. 
The absolute value of the second part 

n 

J2(n-k)Cav{U k ,U ) 

k=p 

can be bounded by invoking once more Deo's result by 

n 
k=p 

which is less than 

nn \\ w \\2/(l-0) / , a k~v+V 



k 



Hence we have that Var/ n / l (x) is of order l/nh p( - 1+/3 l 

Combining the obtained order estimates for the two terms of (|5.6| ) and 
using the L 2 -norm of the function v/j gives the desired result. □ 
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